library(tidyverse)
library(gapminder)
library(maps)
library(WDI)
(df <- gapminder)
asean <- c("Brunei", "Cambodia", "Laos", "Myanmar", "Philippines", "Indonesia", "Malaysia", "Singapore")
df %>% filter(country %in% asean) %>%
ggplot(aes(x = year, y = gdpPercap, col = country)) + geom_line()
df %>% filter(country %in% asean) %>%
ggplot(aes(x = gdpPercap, y = lifeExp, col = country)) + geom_point()
df %>% filter(country %in% asean) %>%
ggplot(aes(x = gdpPercap, y = lifeExp, col = country)) +
geom_point() + coord_trans(x = "log10", y = "identity")
\(\log_{10}{100}\) = 2, \(\log_{10}{1000}\) = 3, \(\log_{10}{10000}\) = 4
\(10^{2.5}\) = 316.227766, \(10^{3}\) = 1000, \(10^{3.5}\) = 3162.2776602,
\(10^{4}\) = 10^{4}, \(10^{4.5}\) = 3.1622777^{4}.
df_wdi <- WDI(
country = "all",
indicator = c(lifeExp = "SP.DYN.LE00.IN", pop = "SP.POP.TOTL", gdpPercap = "NY.GDP.PCAP.KD")
)
df_wdi
df_wdi_extra <- WDI(
country = "all",
indicator = c(lifeExp = "SP.DYN.LE00.IN", pop = "SP.POP.TOTL", gdpPercap = "NY.GDP.PCAP.KD"),
extra = TRUE
)
df_wdi_extra
EDA is an iterative cycle that helps you understand what your data says. When you do EDA, you:
Generate questions about your data
Search for answers by visualising, transforming, and/or modeling your data
Use what you learn to refine your questions and/or generate new questions
EDA is an important part of any data analysis. You can use EDA to make discoveries about the world; or you can use EDA to ensure the quality of your data, asking questions about whether the data meets your standards or not.
The term ``Open Data’’ has a very precise meaning. Data or content is open if anyone is free to use, re-use or redistribute it, subject at most to measures that preserve provenance and openness.
WDI(country = "all",
indicator = "NY.GDP.PCAP.KD",
start = 1960,
end = 2020,
extra = FALSE,
cache = NULL)
c('women_private_sector' = 'BI.PWK.PRVS.FE.ZS')library(WDI)
WDIsearch(string = "NY.GDP.PCAP.KD",
field = "indicator", cache = NULL)
WDIsearch(string = "population",
field = "name", short=FALSE, cache = wdi_cache)
WDIsearch(string = "NY.GDP.PCAP.KD",
field = "indicator", short = FALSE, cache = NULL)
WDIsearch(string = "gdp",
field = "name", short = TRUE, cache = NULL)
WDIbulk downloads the zip file of Bulk Downloads in WDI site , it is a list containing 6 data frames: Data, Country, Series, Country-Series, Series-Time, FootNote.
timeout: integer maximum number of seconds to wait for
download
wdi <- WDIbulk(timeout = 600)
wdi$Data
wdi$Country
wdi$Series
wdi$`Country-Series`
wdi$`Series-Time`
wdi$FootNote
Download an updated list of available WDI indicators from the World Bank website. Returns a list for use in the WDIsearch function.
wdi_cache <- WDIcache()
Downloading all series information from the World Bank website can
take time. The WDI package ships with a local data object with
information on all the series available on 2012-06-18. You can update
this database by retrieving a new list using WDIcache, and
then feeding the resulting object to WDIsearch via the
cache argument.
wdi_cache
List of 2 data frames
The first character matrix includes a full list of WDI series. This list is updated semi-regularly. Users can refresh the list manually using the ‘WDIcache()’ function and search in the updated list using the ‘cache’ argument.
glimpse(WDI_data)
WDI_data$series
WDI_data$country
WDI_data$country %>% filter(country == "Japan")
WDIsearch(string = "gdp",
field = "name", short = FALSE, cache = wdi_cache)
Find indicators:
WDIsearch(string = "gdp", field = "name", short = FALSE, cache = NULL)Indicator: EN.ATM.CO2E.PC
co2pcap <- WDI(country = "all", indicator = "EN.ATM.CO2E.PC", start = 1960, end = NULL, extra = TRUE, cache = wdi_cache)
co2pcap
readr, readxlreadr, ggplot2; Public Data, WDI, WIR,
etc
EDA is an iterative cycle that helps you understand what your data says. When you do EDA, you:
Generate questions about your data
Search for answers by visualising, transforming, and/or modeling your data
Use what you learn to refine your questions and/or generate new questions
EDA is an important part of any data analysis. You can use EDA to make discoveries about the world; or you can use EDA to ensure the quality of your data, asking questions about whether the data meets your standards or not.
There is no rule about which questions you should ask to guide your research. However, two types of questions will always be useful for making discoveries within your data. You can loosely word these questions as:
The rest of this tutorial will look at these two questions. To make the discussion easier, let’s define some terms…
ggplot2 Basicsvisualization
ggplot2 Extra